Machine Learning on Statistical Manifold

نویسندگان

  • Bo Zhang
  • Harvey Mudd
  • Weiqing Gu
چکیده

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statisticalmanifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clusteringmethod, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We presentmany interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k 2 and k 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k 3 clusters, and the discrete Poisson distributions with k 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Geometric Density Estimation

We investigate kernel density estimation where the kernel function varies from point to point. Density estimation in the input space means to find a set of coordinates on a statistical manifold. This novel perspective helps to combine efforts from information geometry and machine learning to spawn a family of density estimators. We present example models with simulations. We discuss the princip...

متن کامل

Dissimilarity Data in Statistical Model Building and Machine Learning

We explore three papers concerned with two methods for incorporating discrete, noisy, incomplete dissimilarity data into statistical/machine learning models for supervised, semisupervised or unsupervised machine learning. The two methods are RKE (Regularized Kernel Estimation), and RMU (Regularized Manifold Unfolding). Briefly put, the methods use dissimilarity information between objects in a ...

متن کامل

Some Research Problems in Metric Learning and Manifold Learning

In the past few years, metric learning, semi-supervised learning, and manifold learning methods have aroused a great deal of interest in the machine learning community. Many machine learning and pattern recognition algorithms rely on a distance metric. Instead of choosing the metric manually, a promising approach is to learn the metric from data automatically. Besides some early work on metric ...

متن کامل

Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness

High-dimensional datasets are well-approximated by low-dimensional structures. Over the past decade, this empirical observation motivated the investigation of detection, measurement, and modeling techniques to exploit these low-dimensional intrinsic structures, yielding numerous implications for high-dimensional statistics, machine learning, and signal processing. Manifold learning (where the l...

متن کامل

Tensor Balancing on Statistical Manifold

We solve tensor balancing, rescaling an N th order nonnegative tensor by multiplying N tensors of order N −1 so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton’s method and show in numeric...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017